The non-stationary stochastic multi-armed bandit problem
نویسندگان
چکیده
منابع مشابه
Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler’s objective is to maximize his cumulative expected earnings over some given horizon of play T . To do this, the gambler needs to acquire information about arms (ex...
متن کاملThe Multi-armed Bandit Problem: an Efficient Non-parametric Solution
Lai and Robbins (1985) and Lai (1987) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from within a specified parametric family. In recent years there has been renewed interest ...
متن کاملCombinatorial Multi-Objective Multi-Armed Bandit Problem
In this paper, we introduce the COmbinatorial Multi-Objective Multi-Armed Bandit (COMOMAB) problem that captures the challenges of combinatorial and multi-objective online learning simultaneously. In this setting, the goal of the learner is to choose an action at each time, whose reward vector is a linear combination of the reward vectors of the arms in the action, to learn the set of super Par...
متن کاملThe multi-armed bandit problem with covariates
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewa...
متن کاملAlgorithms for the multi-armed bandit problem
The stochastic multi-armed bandit problem is an important model for studying the explorationexploitation tradeoff in reinforcement learning. Although many algorithms for the problem are well-understood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper presents a thorough empirical study of the most popular multi-armed bandit algorithms. Three important...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Data Science and Analytics
سال: 2017
ISSN: 2364-415X,2364-4168
DOI: 10.1007/s41060-017-0050-5